The following maps show the concentration of asthma and PM2.5 prevalence in Bay Area Counties. The data used to create the maps comes from CalEnviroScreen 4.0, released on October 20, 2021. The maps are at the level of Census Tracts.
(I had to narrow my scope to only Bay Area Counties rather than conduct a statewide analysis because otherwise my computer would not knit the file and allow me to upload it to Github.)
The data for the map showing the prevalence of Asthma is the rate of Emergency Department visits for asthma per 10,000 people per year, averaged over 2015-2017. Thus, the data is only up to date until 2017 and may reflect numbers that are no longer accurate.
The PM2.5 map contains data on the fine particle pollution throughout the state. The data represents the annual mean concentration of PM2.5 (the weighted average of measured monitor concentrations and satellite observations, µg/m3), over three years, from 2015 to 2017. Thus, the data may also only be accurate up until 2017 and may be outdated.
Both maps show disproportionately high prevalence of Asthma and PM2.5 certain urban areas, in particular in the East Bay and in the Eastern part of the Bay Area counties. However, an interesting thing to note is that although it seemed PM2.5 levels were high throughout many urban areas, Asthma levels were not always distributed in the same way. Overall, the levels of prevalence did seem to be at least slightly correlated at first glance, so I will see if this assumption is correct with a more rigorous examination.
Below is a scatter plot with PM2.5 on the x-axis and Asthma on the y-axis, with a best-fit line.
The line does not appear to fit the data well, since the data is not normally distributed. It seems as though points is clustered in the bottom center of the plot, but since there are some very high levels of asthma, the distribution is off. The best fit line currently shows a slight positive correlation.
##
## Call:
## lm(formula = Asthma ~ PM2.5, data = ces4_map)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.424 -21.485 -6.539 13.432 193.479
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.4917 1.6229 21.25 <2e-16 ***
## PM2.5 1.7228 0.1564 11.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.34 on 8022 degrees of freedom
## Multiple R-squared: 0.01491, Adjusted R-squared: 0.01479
## F-statistic: 121.4 on 1 and 8022 DF, p-value: < 2.2e-16
The best fit line does not seem to represent the data very well. The median residual is not centered around zero, is not very close to zero, and does not have symmetrical distribution. The standard error for the slope coefficient is also relatively high, at .156.
Still we can glean some information from this plot. The linear regression analysis suggest the following. An increase of 1 µg/m3 PM2.5 per year on average is associated with an increase of 1.7228 ED visits per year per 10,000 people for Asthma. 1.49% of the variation in Asthma is explained by the variation in PM2.5.